Methods and systems for a closest match search

ABSTRACT

A system to generate an index for a closest match search is described. The system receives a corpus of information that includes member information. The system parses the member information to generate signatures for each of the members and stores the signatures in the index. The signatures are unique to the members. Accordingly, the signatures signify the respective members. The system subsequently utilizes the index to identify input information that matches signatures in the index to identify a closest match of the input information to one or more members in the corpus information.

RELATED APPLICATIONS

This application is a continuation application which claims the prioritybenefits of U.S. application Ser. No. 13/682,363, a continuationapplication, filed Nov. 20, 2012, which claims the priority benefits ofU.S. application Ser. No. 12/605,225, filed Oct. 23, 2009, which claimsthe priority benefits of U.S. Provisional Application No. 61/228,103,filed Jul. 23, 2009, all of which are incorporated herein by referencein their entirety.

FIELD

Embodiments relate generally to the technical field of datacommunications and, in one example embodiment, to a closest matchsearch.

BACKGROUND

An item may be identified as most closely matched to one or more knownitems. Such information may be helpful to determine whether a particularitem resembles one or more known items. For example, an item that islisted for sale on a network-based marketplace may be identified as mostclosely matched to a known product from a catalogue of products.Improving the accuracy and efficiency of such identifications is achallenge to the present technology.

BRIEF DESCRIPTION OF DRAWINGS

The present disclosure is illustrated by way of example and notlimitation in the figures of the accompanying drawings, in which likereferences indicate similar elements and in which:

FIG. 1 is a diagram depicting a sequence of operations, according to oneexample embodiment, to execute a closest match search;

FIG. 2 is a diagram depicting a sequence of operations, according to oneexample embodiment, to execute a closest match search;

FIG. 3 is a diagram depicting a sequence of operations utilizing anindex, according to one example embodiment, to execute a closest matchsearch;

FIG. 4 is a network diagram depicting a system, according to one exampleembodiment, to execute a closest match search;

FIG. 5 is a block diagram illustrating marketplace and paymentapplications, according to an embodiment;

FIG. 6A is a block diagram illustrating listing classificationapplications, according to an embodiment;

FIG. 6B is a block diagram illustrating a product autotagger indexermodule, according to an embodiment;

FIG. 6C is a block diagram illustrating a maximum signature matchingengine, according to an embodiment;

FIG. 7A is a block diagram illustrating tables, according to anembodiment;

FIG. 7B is a block diagram illustrating an items table, according to anembodiment;

FIG. 7C is a block diagram illustrating listing information, accordingto an embodiment;

FIG. 8A is a block diagram illustrating corpus information, according toan embodiment;

FIG. 8B is a block diagram illustrating standard information, accordingto an embodiment;

FIG. 9A is a block diagram illustrating an entity set, according to anembodiment;

FIG. 9B is a block diagram illustrating a feature set, according to anembodiment;

FIG. 9C is a block diagram illustrating a candidate signature set,according to an embodiment;

FIG. 9D is a block diagram illustrating an index signature set,according to an embodiment;

FIG. 10A is a block diagram illustrating index information, according toan embodiment;

FIG. 10B is a block diagram illustrating an index, according to anembodiment;

FIG. 11A is a block diagram illustrating input information, according toan embodiment;

FIG. 11B is a block diagram illustrating an input feature, according toan embodiment;

FIG. 11C is a block diagram illustrating an input signature, accordingto an embodiment;

FIG. 12 is a block diagram illustrating a method, according to anembodiment, to generate an index for a closest match search;

FIG. 13 is a block diagram illustrating a method, according to anembodiment, to utilize an index to identify a closest match; and

FIG. 14 is a block diagram of a machine, according to an exampleembodiment, including instructions to perform any one or more of themethodologies described herein.

DETAILED DESCRIPTION

In the following description, for purposes of explanation, numerousspecific details are set forth in order to provide a thoroughunderstanding of an embodiment of the present disclosure. It will beevident, however, to one of ordinary skill in the art that the presentdisclosure may be practiced without these specific details.

Closest Match Search Problems

FIG. 1 is a diagram depicting a flow chart illustrating a sequence ofoperations 11, according to one example embodiment, to execute a closestmatch search. The sequence of operations 11 may be applied to a closestmatch search problem. The closest match search problem may be defined asfinding an item(s) in a corpus (e.g., documents, web pages, listings,data items, etc.) which most closely resembles an input, with thefinding based on a confidence score. Examples of the closest matchsearch problems may include the following:

-   -   1. Document clustering/classification: given a document, find        the document (or document class) in the corpus which most        closely resembles the given document;    -   2. Attribute Extraction: given a listing title, find the        attribute values in a catalog (or a list of values) which most        closely resemble sections of the listing title. For example, in        one embodiment, a listing may include listing information (e.g.,        text, image, picture, Uniform Resource Locator, etc.) that is        descriptive of an item or service that is offered for sale or        auction on a network-based marketplace. In one embodiment, the        listing information may include a title.    -   3. Product Tagging: given a listing, find the product in a        catalog which most closely resembles the listing.

The sequence of operations 11 is shown to include input information thatmay be compared to corpus information in order to generate outputinformation. The corpus information may include member information (A)through (E). The sequence of operations 11 may compare (operation 13)the input information to the respective member information to generate(operation 15) the output information. The output information mayinclude member information that most closely matches the inputinformation. For example, the members A and D most closely match theinput information, with the member A being associated with a confidencescore of 0.99 and the member D being associated with a confidence scoreof 0.67. Accordingly, the sequence of operations 11 may identify themember A as most closely matched to the input information and the memberB as the next most closely matched to the input information. Further,the output information may include measures of confidence.

FIG. 2 is a diagram depicting a sequence of operations 19, according toone example embodiment, to execute a closest match search. The sequenceof operations 19 provides a further example of a closest match search.The sequence of operations 19 is shown to include input information thatmay be compared (operation 21) to corpus information to generate(operation 23) output information. In one embodiment, the inputinformation may include a listing describing an item for sale on anetwork-based marketplace. For example, the listing illustrated is for acamera and includes a title, “Canon EOS Rebel XSi 12.2 Megapixel.”Further illustrated are attributes that may be embodied as name-value(NV) pairs. For example, a first NV pair is illustrated as “Brand=Canon” and a second NV pair is illustrated as “Model=EOS Rebel XSi.”Other embodiments may not include NV pairs. The corpus information isshown to include a catalog of camera information. Each entry correspondsto a camera and may include an identifier, a title and one or moreattributes (NV pairs). The output information may include memberinformation that most closely matches the input information. Forexample, the members associated with member identifier 12345678 andmember identifier 12345679 most closely match the input information,with the member 12345678 being associated with a confidence score of0.99 and the member 12345679 being associated with a confidence score of0.67. Accordingly, the sequence of operations 19 may identify the member12345678 as most closely matched to the input information and the member12345679 as the next most closely matched to the input information.

FIG. 3 is a diagram depicting a sequence of operations 25, according toone example embodiment, to execute a closest match search. The sequenceof operations 25 differs from the prior two sequences of operations byutilizing an index. Utilization of an index may include the following:

-   -   1. Define a measure of similarity;    -   2. index the features found in the target candidates (e.g., text        tokens in documents or product titles, or non textual attribute        values such as dates, prices or colors);    -   3. search for candidates which contain the features in the        input; and    -   4. calculate a score for each of the candidates found in step 3        using the measure defined in step 1.

For example, one approach may use tokens as features. A token is anatomic unit of text (e.g., word, punctuation, etc.). Another approachmay use non textual attribute values such as dates, prices or colors asfeatures. Indeed, a feature may be any entity, combination, or sequenceof entities, associated with the target candidates. For example, aconsecutive sequence of two text tokens (often known as bi-grams) can bea feature, and so can a combination of a price and a date.

A feature may be a single entity or a combination or a sequence ofmultiple entities. Further, features that overlap over the length of theinput information may constitute a similarity measure, according to oneembodiment. For example, a set of features that completely overlapsinput information may constitute a similarity measure of 100%.

The sequence of operations 25 illustrates the utilization of corpusinformation to generate (operation 27) an index that may be subsequentlyutilized for a comparison (operation 29) with input information togenerate candidate information that is subsequently utilized to identify(operation 31) output information. For example, the corpus informationmay include multiple documents (e.g., D1, D2, etc.) that respectivelyinclude tokens in the form of words (e.g., w1, w2, etc.). The index maybe used to map the words to the documents that contain the words (e.g.,w1→D1). The output information includes the most closely matcheddocument from the corpus information based on scores. The scoresrepresent the coverage of the input information (e.g., Di) by the wordsin the respective document from the corpus information. For example, Diis illustrated as 100% covered by D1 because D1 contains all of thewords in Di. The success of the sequence of operations 25 is dependenton the ability of the measure of similarity to accurately represent theactual degree of similarity, which, in turn, depends on:

-   -   1. what the features are, and    -   2. how the score is calculated (e.g., how the features are        weighted and how the scores from all the features found are        combined, common weighting methods may include inverse document        frequency (IDF)).        Features that may be used to Measure Similarity

Consider the following example input information: “EOS Rebel XSi.”Features that may be used to measure similarity with the example inputinformation may include:

-   -   Unigrams (“EOS,” “Rebel,” “XSi,”)    -   (Consecutive) Bi-grams (“EOS Rebel”, “Rebel XSi”, . . . )    -   Non-consecutive Bi-grams (“EOS”+“Rebel”, “EOS”+“XSi”, . . . ,        “Rebel”+“XSi”, . . . )    -   (Consecutive) Trigrams    -   Non-consecutive Trigrams    -   (Consecutive) N-grams    -   Non-consecutive N-grams

In general, M-grams function better for measuring similarity thanN-grams for any M>N. M-grams function better than N-grams because, forany M>N, the similarity measures using M-grams as features are lessprone to score an estimation error compared to those using N-grams.Calculating the score of a candidate based on multiple feature matchesmay include combining the scores of the individual matches with theassumption that they are conditionally independent. Combining scores inthis manner may result in the inflation of the scores of input withmultiple matches of related features (e.g., the main short-coming of theNaïve Bayesian approach). For example, from a Bayesian point of view,for cameras, a match of the feature “EOS” should not provide anyadditional evidence if the feature “Rebel” is also found (since “Rebel”implies “EOS”). On the other hand, if the bi-gram “EOS”+“Rebel” is afeature itself, then no combination of scores from individual matches isneeded, and a major source of score estimation error is avoided.Accordingly, M-grams function better for measuring similarity thanN-grams for any M>N.

The Reason N-Grams are not Used

In prior art systems, uni-grams and, at most, bi-grams may be used asfeatures due to the issue of scalability. For example, N-grams that aregreater than two may not be used because a vocabulary of X distinctwords scales as follows:

-   -   Number of possible unigrams=X    -   Number of possible bi-grams=X²    -   Number of possible tri-grams=X³    -   Number of possible n-grams=X^(n)

In other words, the size of an index increases exponentially as Nincreases. Accordingly, using N-grams that are greater than two mayresult in memory requirements that are prohibitively large and accessdelays that are prohibitively long.

The Maximal Signature Match Approach

This disclosure describes solutions for the above described closestmatch search problems. A closest match search may include a given corpusand, for each input, find members in the corpus which most closelyresemble the input. Merely for example, in one embodiment, a listing maybe received from a seller for publication on a network-basedmarketplace. The listing may include input in the form of a title thatis descriptive of an item that is for sale (lease, bid, donation, etc.)on the network-based marketplace. In this embodiment, the corpus maytake the form of a product catalog that includes members that correspondto products. To match the title in the listing to a product in thecatalog of products, an index may be generated. The index may store“signatures” for each of the products. Once generated, the signaturesmay be utilized to quickly and efficiently identify the product thatmost closely matches the title of the listing because the index isgenerated such that each of the “signatures” in the product catalogcorresponds to a single product. Accordingly, the “signatures” aredesignated as such because they signify a single product in the catalogof products. Utilizing “signatures” enables the method and systems thatare described herein to achieve a high degree of accuracy and reduceruntime resources.

The Maximal Signature Match Approach is described in various embodimentsmore fully in detail as follows. The Maximal Signature Match Approachmay utilize N-grams as features, with N =up to the number of entities inthe input. For example, the entities may include tokens in the text(e.g., input). For the model value “EOS Rebel XSi,” the candidatefeatures may be:

-   -   “EOS”+“Rebel”+“XSi”    -   “EOS”+“Rebel”    -   “EOS”+“XSi”    -   “Rebel”+“XSi”    -   “EOS”    -   “Rebel”    -   “XSi”        and, for the title “Canon EOS Rebel XSi,” the candidate features        may be:    -   “Canon”+“EOS”+“Rebel”+“XSi”    -   “Canon”+“EOS”+“Rebel”    -   “Canon”+“EOS”+“XSi”    -   “Canon”+“Rebel”+“XSi”    -   “EOS”+“Rebel”+“XSi”    -   “Canon”+“EOS”    -   “Canon”+“Rebel”    -   “Canon”+“XSi”    -   “EOS”+“Rebel”    -   “EOS”+“XSi”    -   “Rebel”+“XSi”    -   “Canon”    -   “EOS”    -   “Rebel”    -   “XSi”

The Maximal Signature Match Approach may utilize N-grams as featuresbecause this approach only utilizes index “signatures” therebyprecluding the prohibitively large memory requirements previouslymentioned. Further, only signatures with scores above a certainthreshold may be indexed. Specifically, a “signature” is defined as ann-gram which uniquely identifies a target (e.g., title, attribute value,etc.). For example, a camera catalog with only two Canon EOS cameras mayinclude the following entries:

-   -   12345678, “Canon EOS Rebel XSi 12.2 Megapixel,” Brand=Canon,        Model=EOS Rebel XSi, Resolution=12.2 Megapixel    -   12345679, “Canon EOS Digital Rebel XTi 10.1 Megapixel,”        Brand=Canon, Model=EOS Digital Rebel XTi, Resolution=10.1        Megapixel        The signatures for “model=EOS Rebel XSi” may be:    -   EOS+Rebel+XSi    -   Rebel+XSi    -   EOS+XSi    -   XSi    -   Note that EOS+Rebel, EOS, or Rebel, are not signatures.

The Maximal Signature Match Approach—Scoring and Weighting

A score may be calculated for each signature, based on the signature“coverage” of the target and the weights of the entities in thesignature. Weights may be determined based on occurrence frequency. Forexample, assume the weights of EOS, Rebel and XSi are 0.7, 0.7 and 0.9,respectively. Then the scores of the signatures may be computed asfollows:

Weight(EOS+Rebel+XSi)=(0.7+0.7+0.9)/(0.7+0.7+0.9)=1.0

Weight(Rebel+XSi)=(0.7+0.9)/(0.7+0.7+0.9)=0.7

Weight(EOS+XSi)=(0.7+0.9)/(0.7+0.7+0.9)=0.7

Weight(XSi)=(0.9)/(0.7+0.7+0.9)=0.39

If the targets are titles, the weight of a token may be determined basedon the type of attribute in which it is found, in addition to occurrencefrequencies. For example, a token found in a model attribute may begiven a higher weight than a token that is found in a brand attribute.Further, both tokens may be assigned a higher weight than a token notfound in either of these features.

The Maximal Signature Match Approach—Threshold Optimization

If a threshold of 0.6 is applied to the scores for the above listedsignatures (e.g., 1.0, 0.7, 0.7, and 0.39), then the signatures to indexinclude the following:

-   -   EOS+Rebel+XSi (w=1.0)    -   Rebel+XSi (w=0.7)    -   EOS+XSi (w=0.7)

For run time optimization, the set of entities which form the signaturesmay be further indexed with identifiers. For example, the indices mayappear as follows:

-   -   EOS+Rebel+XSi→[12345678, 1.0]    -   Rebel+XSi→[12345678, 0.7]    -   EOS+XSi→[12345678, 0.7]    -   and    -   EOS    -   Rebel    -   XSi

The Maximal Signature Match Approach—Maximal Signature Match Summary

At run-time, given a listing, a system, for which an embodiment is shownin FIG. 4 and described below, may identify the longest signatures thatmay be found in the listing. For example, suppose the system is toidentify the model of an input listing for a camera by using the titleof the listing as input, and the title of the listing is:

“New Canon Digital Rebel XSi, Great Deal!”

Using the token index, the system may extract the set of tokens that arefound in any of the signatures:

“Rebel,” “XSi”

The system may then create all possible signatures from this set,starting from the whole set and continuing with the next biggest subsetuntil all possible signatures are identified. The system may thendetermine whether any of the respective signatures are found in thesignature index. For our example, the set of all possible signaturesincludes the signature “Rebel”+“XSi.” The signature “Rebel”+“XSi” is thelongest signature and also identified in the signature index.Accordingly, the system is done in the first lookup, with theresult=[12345678, 0.7].

Platform Architecture

FIG. 4 is a network diagram depicting a system 10, according to oneexemplary embodiment of the present disclosure, having a client-serverand a peer-to-peer architecture. A social networking system facilitatesshopping activity, in the exemplary form of a network-based marketplace12 communicating over a network 14. The network-based marketplace 12communicates in a client-server architecture with clients. Thenetwork-based marketplace 12 provides server-side functionality, via thenetwork 14 (e.g., the Internet) to one or more client machines 20 and22. FIG. 4 illustrates, for example, a web client 16 (e.g., a browser,such as the Internet Explorer browser developed by Microsoft Corporationof Redmond, Wash. State), and a programmatic client 18 executing onrespective client machines 20 and 22.

Turning to the network-based marketplace 12, an application programinterface (API) server 24 and a web server 26 are coupled to, andprovide programmatic and web interfaces respectively to, one or moreapplication servers 28. The application servers 28 host one or moremarketplace applications 30 and payment applications 32. The applicationservers 28 are, in turn, shown to be coupled to one or more databasesservers 34 that facilitate access to one or more databases 36.

The marketplace applications 30 provide a number of marketplacefunctions and services to users that access the network-basedmarketplace 12. The payment applications 32 likewise provide a number ofpayment services and functions to users. For example, the paymentapplications 32 may allow users to quantify for, and accumulate, value(e.g., in a commercial currency, such as the U.S. dollar, or aproprietary currency, such as “points”) in accounts, and then to laterredeem the accumulated value for products (e.g., goods or services) thatare made available via the marketplace applications 30. While themarketplace and payment applications 30 and 32 are shown in FIG. 4 toboth form part of the network-based marketplace 12, it will beappreciated that, in alternative embodiments of the present disclosure,the payment applications 32 may form part of a payment service that isseparate and distinct from the network-based marketplace 12. Thenetwork-based marketplace 12 may be embodied as Ebay, The Worlds OnlineMarketplace®, provided by Ebay, Inc. of San Jose, Calif.

Further, while the system 10 shown in FIG. 4 employs a client-serverarchitecture and a peer-to-peer architecture, the present disclosure is,of course, not limited to such an architecture and could equally wellfind application in any combination of client-server, distributed, orpeer-to-peer architecture systems. The various marketplace and paymentapplications 30 and 32 could also be implemented as standalone softwareprograms, which do not necessarily have networking capabilities.

The web client 16, it will be appreciated, accesses the variousmarketplace and payment applications 30 and 32 via the web interfacesupported by the web server 26. Similarly, the programmatic client 18accesses the various services and functions provided by the marketplaceand payment applications 30 and 32 via the programmatic interfaceprovided by the API server 24. The programmatic client 18 may be, forexample, a seller application (e.g., the TurboLister applicationdeveloped by Ebay Inc., of San Jose, Calif.) to enable sellers to authorand manage listings (e.g., items) on the network-based marketplace 12 inan off-line manner, and to perform batch-mode communications between theprogrammatic client 18 and the network-based marketplace 12.

FIG. 4 also illustrates a third party application 38, executing on athird party server machine 40, as having programmatic access to thenetwork-based marketplace 12 via the programmatic interface provided bythe API server 24. For example, the third party application 38 may,utilizing information retrieved from the network-based marketplace 12,support one or more features or functions on a website hosted by thethird party. The third party website may, for example, provide one ormore promotional, marketplace or payment functions that are supported bythe relevant applications of the network-based marketplace 12.

Marketplace Applications

FIG. 5 is a block diagram illustrating multiple marketplace and paymentapplications 30 and 32 that, in one exemplary embodiment of the presentdisclosure, are provided as part of the network-based marketplace 12.The network-based marketplace 12 may provide a number of listing andprice-setting mechanisms whereby a seller may list goods or services forsale, a buyer can express interest in or indicate a desire to purchasesuch goods or services, and a price can be set for a transactionpertaining to the goods or services. To this end, the marketplaceapplications 30 are shown to include one or more auction applications 44which support auction-format listing and price setting mechanisms (e.g.,English, Dutch, Vickrey, Chinese, Double, Reverse auctions, etc.). Thevarious auction applications 44 may also provide a number of features insupport of such auction-format listings, such as a reserve price featurewhereby a seller may specify a reserve price in connection with alisting and a proxy-bidding feature whereby a bidder may invokeautomated proxy bidding.

A number of fixed-price applications 46 support fixed-price listingformats (e.g., the traditional classified advertisement-type listing ora catalogue listing) and buyout-type listings. Specifically, buyout-typelistings (e.g., including the Buy-It-Now (BIN) technology developed byEbay Inc., of San Jose, Calif.) may be offered in conjunction with anauction-format listing, and may allow a buyer to purchase goods orservices which are also being offered for sale via an auction for afixed-price that is typically higher than the starting price of theauction.

Store applications 48 allow sellers to group their listings within a“virtual” store, which may be branded and otherwise personalized by andfor the sellers. Such a virtual store may also offer promotions,incentives and features that are specific and personalized to a relevantseller.

Reputation applications 50 allow parties that transact utilizing thenetwork-based marketplace 12 to establish, build and maintainreputations, which may be made available and published to potentialtrading partners. Consider that where, for example, the network-basedmarketplace 12 supports person-to-person trading, users may have nohistory or other reference information whereby the trustworthiness andcredibility of potential trading partners may be assessed. Thereputation applications 50 allow a user, for example through feedbackprovided by other transaction partners, to establish a reputation withinthe network-based marketplace 12 over time. Other potential tradingpartners may then reference such a reputation for the purposes ofassessing credibility and trustworthiness.

Personalization applications 52 allow users of the network-basedmarketplace 12 to personalize various aspects of their interactions withthe network-based marketplace 12. For example a user may, utilizing anappropriate personalization application 52, create a personalizedreference page on which information regarding transactions to which theuser is (or has been) a party may be viewed. Further, a personalizationapplication 52 may enable a user to personalize listings and otheraspects of their interactions with the network-based marketplace 12 andother parties.

Internationalization applications 54 may support a number ofmarketplaces that are customized, for example, for specific geographicregions. A version of the network-based marketplace 12 may be customizedfor the United Kingdom, whereas another version of the network-basedmarketplace 12 may be customized for the United States. Each of theseversions may operate as an independent marketplace, or may be customized(or internationalized) presentations of a common underlying marketplace.

Navigation of the network-based marketplace 12 may be facilitated by oneor more navigation applications 56. For example, a search applicationenables key word searches of listings published via the network-basedmarketplace 12. A browse application allows users to browse variouscategory, catalogue, or inventory data structures according to whichlistings may be classified within the network-based marketplace 12.Various other navigation applications may be provided to supplement thesearch and browsing applications.

In order to make listings, available via the network-based marketplace12, as visually informing and attractive as possible, the marketplaceapplications 30 may include one or more imaging applications 58 whichusers may utilize to upload images for inclusion within listings. Theimaging applications 58 also operate to incorporate images within viewedlistings. The imaging applications 58 may also support one or morepromotional features, such as image galleries that are presented topotential buyers. For example, sellers may pay an additional fee to havean image included within a gallery of images for promoted items.

Listing creation applications 60 allow sellers to conveniently authorlistings pertaining to goods or services that they wish to transact viathe network-based marketplace 12, and listing management applications 62allow sellers to manage such listings. Specifically, where a particularseller has authored and/or published a large number of listings, themanagement of such listings may present a challenge. The listingmanagement applications 62 provide a number of features (e.g.,auto-relisting, inventory level monitors, etc.) to assist the seller inmanaging such listings.

One or more post-listing management applications 64 also assist sellerswith a number of activities that typically occur post-listing. Forexample, upon completion of an auction facilitated by one or moreauction applications 44, a seller may wish to leave feedback regarding aparticular buyer. To this end, a post-listing management application 64may provide an interface to one or more reputation applications 50, soas to allow the seller to conveniently provide feedback regardingmultiple buyers to the reputation applications 50. In addition, thepost-listing management applications 64 may facilitate the tracking andorganization of listings for a user by maintaining lists of selectlistings. For example, the lists may include watch information, woninformation, lost information, selling information, sold information andunsold information.

Dispute resolution applications 66 provide mechanisms whereby disputesarising between transacting parties may be resolved. For example, thedispute resolution applications 66 may provide guided procedures wherebythe parties are guided through a number of steps in an attempt to settlea dispute. In the event that the dispute cannot be settled via theguided procedures, the dispute may be escalated to a third partymediator or arbitrator.

A number of fraud prevention applications 68 implement various frauddetection and prevention mechanisms to reduce the occurrence of fraudwithin the marketplace 12.

Messaging applications 70 are responsible for the generation anddelivery of messages to users of the network-based marketplace 12, withsuch messages, for example, advising users regarding the status oflistings at the network-based marketplace 12 (e.g., providing “outbid”notices to bidders during an auction process or providing promotionaland merchandising information to users).

Merchandising applications 72 support various merchandising functionsthat are made available to sellers to enable sellers to increase salesvia the network-based marketplace 12. The merchandising applications 72also operate the various merchandising features that may be invoked bysellers, and may monitor and track the success of merchandisingstrategies employed by sellers.

The network-based marketplace 12 itself, or one or more parties thattransact via the network-based marketplace 12, may operate loyaltyprograms that are supported by one or more loyalty/promotionalapplications 74. For example, a buyer may earn loyalty or promotionalpoints for each transaction established and/or concluded with aparticular seller, and may be offered a reward for which accumulatedloyalty points can be redeemed.

Listing classification applications 76 may support the classification oflistings. For example, the listing classification applications 76 may beutilized to generate an index that stores “signatures” that correspondto products. Once generated, according to one embodiment, the“signatures” in the index may be utilized to quickly and efficientlyidentify a product from a catalogue that most closely matches aparticular listing and to classify the listing according to theidentified product.

FIG. 6A is a block diagram illustrating listing classificationapplications 76. The listing classification applications 76 may includea product autotagger indexer module 78 and a maximum signature matchingengine 80. The product autotagger indexer module 78 may be used togenerate an index. The index may be used to store signatures and otherinformation that are respectively associated with products. The maximumsignature matching engine 80 may utilize the index and the signatures inthe index to identify a listing as being most closely matched to aparticular product based on signatures in the listing that are matchedto signatures in the index.

FIG. 6B is a block diagram illustrating a product autotagger indexermodule 78. The product autotagger indexer module 78 may include a corpusprocessing module 82 and an index generator module 84. The corpusprocessing module 82 may be used to process the corpus information andthe index generator module 84 may be used to generate an index.

FIG. 6C is a block diagram illustrating a maximum signature matchingengine 80. The maximum signature matching engine 80 may include areceiving module 86 and a processing module 88. The receiving module 86may receive input information and identify input features in the inputinformation. For example, the input information may include listinginformation for a listing that is used to offer an item for sale orauction on the network-based marketplace 12. The processing module 88may identify features in the input information, generate inputsignatures based on the features, and identify members in corpusinformation that most closely match the input signatures by utilizing anindex.

Data Structures

FIG. 7A is a high-level entity-relationship diagram, illustratingvarious tables 90 that may be maintained within the databases 36, andthat are utilized by and support the marketplace and paymentapplications 30 and 32. A user table 92 contains a record for eachregistered user of the network-based marketplace 12, and may includeidentifiers, address information, financial information, and accountinformation pertaining to each such registered user. A user may, it willbe appreciated, operate as a seller, a buyer, or both, within thenetwork-based marketplace 12. In one example embodiment of the presentdisclosure, a buyer may be a user that has accumulated value (e.g.,commercial or proprietary currency), and is then able to exchange theaccumulated value for items that are offered for sale by thenetwork-based marketplace 12.

The tables 90 also include an items table 94 in which are maintaineditem records for listings of goods and services that are available tobe, or have been, transacted via the network-based marketplace 12. Eachitem record within the items table 94 may furthermore be linked to oneor more user records within the user table 92, so as to associate aseller and one or more actual or potential buyers with each item record.

A transaction table 96 contains a record for each transaction (e.g., apurchase transaction) pertaining to items for which records exist withinthe items table 94.

An order table 98 is populated with order records, with each orderrecord being associated with an order. Each order, in turn, may beassociated to one or more transactions for which records exist withinthe transactions table 96.

Bid records within a bids table 100 each relate to a bid received at thenetwork-based marketplace 12 in connection with an auction-formatlisting supported by an auction application 44. A feedback table 102 isutilized by one or more reputation applications 50, in one exampleembodiment, to construct and maintain reputation information concerningusers. In one embodiment, the reputation information may includefeedback records associated with transactions. A history table 104maintains a history of transactions to which a user has been a party.One or more attributes tables 106 record attribute informationpertaining to items for which records exist within the items table 94.Considering only a single example of such an attribute, the attributestables 106 may indicate a currency attribute associated with aparticular item, with the currency attribute identifying the currency ofa price for the relevant item as specified by a seller.

The tables 90 are further shown to include index generation information110 and an index 112. The index generation information 110 may includecorpus information 114 and standard information 116. For example, thecorpus information 114 and the standard information 116 may includeinformation for a product catalog that includes multiple products thatmay be offered for sale or auction on the network-based marketplace 12.The index 112 may be used to process input information to efficientlyidentify the most closely matching members in the corpus information114. It will be appreciated that other embodiments may include multipleentries of index generation information 110 corresponding to differenttypes of products, documents, categories, and so forth.

FIG. 7B is a block diagram illustrating an items table 94, according toan embodiment. The items table 94 may include multiple entries oflisting information 118. Each entry may correspond to a listing of anitem or service that is offered for sale on the network-basedmarketplace 12.

FIG. 7C is a block diagram illustrating listing information 118,according to an embodiment. The listing information 118 may includeinput information 121 and a product identifier 123. The inputinformation 121 may be communicated to a maximum signature matchingengine 80 that identifies the most closely matched product in a productcatalog based on the input information 121, and may store a productidentifier 123 that corresponds to the product in the listinginformation 118. The input information 121 is shown to include a titlethat may include alphanumeric text, a description that may includealphanumeric text, a picture, an illustration, an item identifier thatuniquely identifies the listing from other listings in the items table94 and, optionally, with one or more name—value pairs. For example, aname—value pair may include PRICE=5.00, COLOR=blue, or other name-valuepairs. It will be appreciated that other embodiments may include otherinput information 121.

FIG. 8A is a block diagram illustrating corpus information 114,according to an embodiment. The corpus information 114 may be embodiedas a product catalogue. Other embodiments may include a set ofdocuments, a catalog of places, a catalog of services, and so forth. Thecorpus information 114 may include multiple entries of memberinformation 122 that, in the present embodiment, correspond to differentproducts. Each entry of the member information 122 may include a productidentifier 123 that identifies one product from another product in thecorpus information 114, as well as text that describes the product,numeric information such as a price of the product or specifications ofthe product, pictures of the product, illustrations of the product, orany other information that may be descriptive of the product.

FIG. 8B is a block diagram illustrating standard information 126,according to an embodiment. The standard information 126 may includelisting test information 128. Each entry of listing test information 128may include information that is descriptive of an item or service thatmay be offered for sale on the network-based marketplace 12 as well as atest score 130. The test score 130 may be utilized to evaluate theprecision of the maximum signature matching engine 80. For example, themaximum signature matching engine 80 may receive and process an entry ofthe listing test information 128 to generate a score for comparison withthe corresponding test score 130.

FIG. 9A is a block diagram illustrating an entity set 140, according toan embodiment. An entity set 140 may be generated for each member in thecorpus information 114. The entity set 140 may include one or moreentries of entity information 141. The entity set 140 may be generatedby scanning member information 122 in the corpus information 114,identifying entities 142 in the particular member, and assigning entityweights 144 to the respective entities 142 (e.g., tokens, phrases ofwords, pictures, URLs, etc). For example, the entity 142 may be embodiedas a word or acronym that has been parsed from the member information122. Further, for example, the entity weight 144 for the entity 142 maybe determined based on an occurrence frequency of the entity 142 in themember information 122. Other embodiments may utilize other methods tocompute the entity weight 144.

FIG. 9B is a block diagram illustrating a feature set 150, according toan embodiment. A feature set 150 may be generated for each member in thecorpus information 114. The feature set 150 may include one or moreentries of feature information 151. The feature information 151 may begenerated by forming possible combinations of entities 142 taken from aparticular entity set 140, as described above. The feature information151 may include a feature 152 and a feature score 154 that correspondsto the particular feature 152. The feature 152 may include one or moreentities 142 notwithstanding two entities 142 being illustrated in thefeature 152 in FIG. 9B. The feature score 154 may be determined bysumming the entity weights 144 that correspond to the entities 142 inthe feature 152. In some embodiments, feature information 151 associatedwith a feature score 154 that is less than a predetermined threshold maybe removed from the feature set 150.

FIG. 9C is a block diagram illustrating a candidate signature set 160,according to an embodiment. The candidate signature set 160 may includeone or more entries of candidate signature information 161. A candidatesignature set 160 may be generated for each member in the corpusinformation 114. The candidate signature set 160 may be generated byforming all possible N-grams from the features 152 of a particularfeature set 150. The candidate signature information 161 may include acandidate signature 162 and a candidate signature score 164 thatcorresponds to the candidate signature 162. The candidate signature 162may include one or more features 152, notwithstanding two features 152being illustrated in the candidate signature 162 in FIG. 9C. Thecandidate signature score 164 may be determined by summing the featuresscores 154 that correspond to the features 152 and by dividing the sumof the feature scores 154 by a value that represents the sum of thefeatures that completely cover the particular member information 122that corresponds to the feature set 150. In some embodiments, candidatesignature information 161 associated with a candidate signature score164 that is less than a predetermined threshold may be removed from thecandidate signature set 160.

FIG. 9D is a block diagram illustrating an index signature set 170,according to an embodiment. The index signature set 170 may include oneor more entries of index signature information 171. An index signatureset 170 may be generated for each member in the corpus information 114.The index signature set 170 may be generated by identifying thecandidate signatures 162 in the candidate signature set 160 for aparticular member (e.g., Product 1) that does not appear in thecandidate signature sets 160 respectively associated with the othermembers (e.g., Products 2-N) in the corpus information 114. Accordingly,the index signature set 170 includes index signatures 172 that areunique to the particular member and not found in the other members inthe corpus information 114. The index signature information 171 mayinclude an index signature 172 and an index signature score 174 thatcorresponds to the index signature 172. The index signature 172 mayinclude one or more features 152, notwithstanding the three features 152being illustrated in the index signature 172 in FIG. 9D. The indexsignature score 174 may be determined by summing the features scores 154that correspond to the three features 152 and by dividing the sum of thefeature scores by a value that represents the sum of the features thatcompletely cover the particular member information 122 that correspondsto the feature set 150.

FIG. 10A is a block diagram illustrating index information 180,according to an embodiment. The index information 180 may be generatedby the product autotagger indexer module 78. The index information 180may include a time stamp 182, score mapping parameters 183, and one ormore indexes 184. The time stamp 182 may record the time the indexinformation 180 was generated. The score mapping parameters 183 may begenerated and stored with the generation of the index(s) 184. Forexample, the product autotagger indexer module 78 may generate an index184 based on corpus information 114 and invoke the maximum signaturematching engine 80 to process the listing test information 128 includedin the standard information 126 that corresponds to the corpusinformation 114. The results (e.g., a product identifier 123 and aconfidence score for each listing test information 128) returned by themaximum signature matching engine 80 may be compared to the test scores130 provided in the standard information 126 and evaluated to generate asequence of 2-tuples of confidence-score-threshold andprecision-percentage. This sequence of 2-tuples may be used to generatea mapping from the confidence score to an estimated precision percentageas a 3^(rd) degree polynomial, using the “least-squares fit” method,according to an embodiment. The resulting score mapping parameters 183may be stored in the index 184. The indexes 184 may be respectivelygenerated for each pair of corpus information 114 and standardinformation 126.

FIG. 10B is a block diagram illustrating an index 184, according to anembodiment. The index 184 may be generated by the product autotaggerindexer module 78 based on corpus information 114. The index 184 mayinclude score threshold information 185, product data information 186,duplicate information 188, feature set information 190, and indexsignature set information 192. The score threshold information 185 maybe a predetermined threshold below which a signature is discarded.

The product data information 186 may include a product identifier 123and price for each member information 122 (e.g., product) in the corpusinformation 114. The duplicate information 188 may include a mapping ofmember information 122 (e.g., products) with the same titles. Thefeature set information 190 may include the features sets 150respectively corresponding to member information 122 (e.g., products),as previously described in FIG. 9B. The index signature set information192 may include index signature sets 170 respectively corresponding tothe member information 122 (e.g., products), as previously described inFIG. 9D.

FIG. 11A is a block diagram illustrating input information 121,according to an embodiment. The input information 121 may have beenextracted from listing information 118. For example, input information121 may include a title. In another embodiment, the input information121 may include one or more name-value pairs. The input information 121may include input entities 202 (e.g., tokens, phrases of words, URLs,pictures, etc.).

FIG. 11B is a block diagram illustrating an input feature 204, accordingto an embodiment. The input feature 204 may include one or more inputentities 202. The example illustrates three input entities 202; however,more or fewer input entities 202 may be included in a particular inputfeature 204.

FIG. 11C is a block diagram illustrating an input signature 206,according to an embodiment. The input signature 206 may include one ormore input features 204. The example illustrates three input features204; however, more or fewer input features 204 may be included in aparticular input signature 206.

FIG. 12 is a block diagram illustrating method 300, according to anembodiment, to generate an index 184 (not shown) for a closest matchsearch. The method 300 commences at operation 302 with the corpusprocessing module 82 receiving or accessing the corpus information 114and the standard information 116. For example, the corpus information114 and the standard information 116 may be for a catalog of productsthat are offered for sale on the network-based marketplace 12. Thecorpus processing module 82 may parse the respective member information122 (e.g., product) in the corpus information 114. In one embodiment,the corpus processing module 82 may identify products with the sametitle and store the product identifiers 123 of such products in theduplicate information 188 in the index 184. Further, the corpusprocessing module 82 may extract the price from the member information122 for each product and store the price with the corresponding productidentifier 123 in the product data information 186 in the index 184.

At operation 304, the corpus processing module 82 may generate features152. The corpus processing module 82 may generate features 152 byidentifying an entity set 140 for each of the respective memberinformation 122 that, in turn, is used to generate a feature set 150 foreach of the respective member information 122. For example, the corpusprocessing module 82 may identify and tokenize a title respectivelyincluded in each of the member information 122. Other embodiments mayidentify entities 142 in other identified components of the memberinformation 122. For example, the corpus processing module 82 mayidentify a set of name-value pairs included in each of the memberinformation 122. The corpus processing module 82 may further identifythe entity set 140 by filtering “stop words” from the entities 142. Forexample, “stop words” may include words without distinctive value suchas “the,” “or,” etc. The corpus processing module 82 may furtheridentify the entity set 140 by normalizing the entities 142. Forexample, the corpus processing module 82 may select a single entity 142to represent other entities 142 that are identified as semanticallyequivalent. The corpus processing module 82 may further identify theentity set 140 by removing the entities 142 that were extracted from thetitle (e.g., tokens) that match the entities 142 extracted fromname-value pairs. The corpus processing module 82 may utilize the entityset 140 to generate the feature set 150, as previously described. Thefeature set 150 may include entries of feature information 151 that aregenerated by forming every possible combination of entities 142 in aparticular entity set 140.

At operation 306, the corpus processing module 82 may generate featurescores 154 for each of the features 152, as previously described. Atoperation 308, the corpus processing module 82 may store the features152 and the respective feature scores 154 as feature sets 150, accordingto the particular member, in the feature set information 190 in theindex 184.

At operation 312, the index generator module 84 may remove featureinformation 151 from the feature sets 150. For example, the indexgenerator module 84 may remove features 152 respectively associated withfeature scores 154 that are less than a predetermined threshold.

At operation 314, the index generator module 84 may generate candidatesignatures 162 based on the remaining feature information 151. Forexample, the index generator module 84 may generate a candidatesignature set 160 for each of the member information 122 in the corpusinformation 114. The candidate signatures 162 in the candidate signatureset 160 may be generated from the feature set 150 for the particularmember information 122. You may recall that a candidate signature 162may include an individual feature 152 or a combination of consecutivefeatures 152 to form a new candidate signature 162.

At operation 316, the index generator module 84 may generate candidatesignature scores 164 for each of the candidate signatures 162. The indexgenerator module 84 may generate candidate signature scores 164according to the coverage of the associated candidate signature 162 overthe corresponding member information 122 (e.g., product). For example,the index generator module 84 may generate a candidate signature score164 by summing the feature scores 154 associated with each of thefeatures 152 in the candidate signature score 164 and dividing by thesum of feature scores 154 that cover the entire member information 122(e.g., product).

At operation 318, the index generator module 84 may remove candidatesignature information 161 from the respective candidate signature sets160. For example, the index generator module 84 may remove candidatesignatures 162 from each of the candidate signature sets 160 that areassociated with a candidate signature score 164 less than apredetermined threshold.

At operation 322, the index generator module 84 may identify indexsignatures 172 for each member information 122 (e.g., product) in thecorpus information 114. The index generator module 84 may identify indexsignatures 172 for a particular member information 122 (e.g., product)by removing candidate signatures 162 from the candidate signature set160 for the particular member information 122 (e.g., first plurality ofcandidate signatures) that also appears in candidate signature sets 160for the remaining member information 122 (e.g., second plurality ofcandidate signatures). Accordingly, the remaining candidate signatures162 are designated index signatures 172 because the candidate signaturessignify the particular member (e.g., product) by being unique to theparticular member.

At operation 324, the index generator module 84 may store the indexsignatures 172 that are used to signify the particular member inassociation with index signatures scores 174 as index signature setinformation 192 in the index 184. For example, the processing module 88may store an index signature set 170 in the index 184 for each of themember information 122 in the corpus information 114.

FIG. 13 is a block diagram illustrating method 400, according to anembodiment, to utilize an index to identify a closest match. The method400 commences at operation 402 with the receiving module 86 receivinginput information 121 (e.g., listing) for matching against memberinformation 122 (e.g., products) in corpus information (e.g., catalog ofproducts). For example, the input information 121 may include a title,description, or other information for a listing of an item or servicethat is offered for sale on a network-based marketplace 12. At operation404, the processing module 88 may parse the input information toidentify (e.g., tokenize) one or more input entities 202, as previouslydescribed.

At operation 406, the processing module 88 may generate input features204 based on the input entities 202. For example, the processing module88 may generate input features 204 of one input entity 202 or bycombining multiple input entities 202. In one embodiment, the inputfeatures 204 may include input entities 202 that are consecutivelyoccurring in the input information 121. At operation 408, the processingmodule 88 may identify whether to remove an input feature 204 that waspreviously identified in the input information 121. For example, theprocessing module 88 may utilize the input feature 204 to look up amatching feature 152 in the feature set information 190 of theappropriate index 184. If the processing module 88 does not identify amatching feature 152, then the input feature 204 is removed. Atoperation 410, the processing module 88 may utilize the remaining inputfeatures 204 to generate input signatures 206. For example, theprocessing module 88 may generate input signatures 206 of one inputfeature 204 or by combining multiple input features 204. At operation412, the processing module 88 may identify member information 122 (e.g.,product) in the corpus information 114 (e.g.,catalogue of products) thatmost closely matches the input information 121. For example, theprocessing module 88 may utilize the input signatures 206 to look-upmatching index signatures 172 in the index signature set information 192of the appropriate index 184. The processing module 88 may identify theindex signature 172 that is most closely matched from the indexsignatures 172 based on the index signature scores 174 associated withthe index signatures that were previously identified as matched. Forexample, the processing module 88 may identify a particular indexsignature 172 as most closely matched because the associated indexsignature score is the highest index signature score 174. In oneembodiment, the processing module 88 may identify the index signature172 that is next most closely matched based on the next highest indexsignature score 174, and so on.

FIG. 14 is a diagrammatic representation of a machine in the exampleform of a computer system 1000 within which a set of instructions, forcausing the machine to perform any one or more of the methodologiesdiscussed herein, may be executed. In alternative embodiments, themachine operates as a standalone device or may be connected (e.g.,networked) to other machines. In a networked deployment, the machine mayoperate in the capacity of a server or a client machine in aserver-client network environment, or as a peer machine in apeer-to-peer (or distributed) network environment. The machine may be aserver computer, a client computer, a personal computer (PC), a tabletPC, a set-top box (STB), a Personal Digital Assistant (PDA), a cellulartelephone, a web appliance, a network router, switch or bridge, or anymachine capable of executing a set of instructions (sequential orotherwise) that specify actions to be taken by that machine. Further,while only a single machine is illustrated, the term “machine” shallalso be taken to include any collection of machines that individually orjointly execute a set (or multiple sets) of instructions to perform anyone or more of the methodologies discussed herein.

The example computer system 1000 includes a processor 1002 (e.g., acentral processing unit (CPU), a graphics processing unit (GPU), orboth), a main memory 1004 and a static memory 1006, which communicatewith each other via a bus 1008. The computer system 1000 may furtherinclude a video display unit 1010 (e.g., a liquid crystal display (LCD)or a cathode ray tube (CRT)). The computer system 1000 also includes analphanumeric input device 1012 (e.g., a keyboard), a cursor controldevice 1014 (e.g., a mouse), a disk drive unit 1016, a signal generationdevice 1018 (e.g., a speaker) and a network interface device 1020.

The disk drive unit 1016 includes a machine-readable medium 1022 onwhich is stored one or more sets of instructions (e.g., software 1024)embodying any one or more of the methodologies or functions describedherein. The software 1024 may also reside, completely or at leastpartially, within the main memory 1004 and/or within the processor 1002during execution thereof by the computer system 1000, with the mainmemory 1004 and the processor 1002 also constituting machine-readablemedia.

The software 1024 may further be transmitted or received over a network1026 via the network interface device 1020.

While the machine-readable medium 1022 is shown in an example embodimentto be a single medium, the term “machine-readable medium” should betaken to include a single medium or multiple media (e.g., a centralizedor distributed database, and/or associated caches and servers) thatstore the one or more sets of instructions. The term “machine-readablemedium” shall also be taken to include any medium that is capable ofstoring, encoding or carrying a set of instructions for execution by themachine and that cause the machine to perform any one or more of themethodologies of the present disclosure. The term “machine-readablemedium” shall accordingly be taken to include, but not be limited to,solid-state memories, optical and magnetic media, and carrier wavesignals.

Certain example embodiments may facilitate reduced processor loading,faster processor operation, reduced network traffic, and reduced datastorage. For example, limiting an index to n-grams that are identifiedto be “index signatures” contributes towards reduced data storage, aspreviously mentioned. The reduced data storage, in turn, contributestowards reduced processor loading and faster processor operation,because the index is optimized for runtime computations. Finally, theutilization of “index signatures” increases the precision of the searchresults contributing towards fewer searches because the search resultsare more precise thus reducing network traffic. Further for example, theremoval of features associated with feature scores below a predeterminedthreshold and the removal of candidate signatures associated withcandidate signatures scores below a predetermined threshold alsocontribute towards reduced data storage leading to the reduced processorloading, faster processor operation, and reduced network traffic asmentioned above.

Modules, Components and Logic

Certain embodiments are described herein as including logic or a numberof modules, components or mechanisms. A module, logic, component ormechanism (herein after collectively referred to as a “module”) may be atangible unit capable of performing certain operations and configured orarranged in a certain manner. In example embodiments, one or morecomputer systems (e.g., a standalone, client or server computer system)or one or more components of a computer system (e.g., a processor or agroup of processors) may be configured by software (e.g., an applicationor application portion) as a “module” that operates to perform certainoperations as described herein.

In various embodiments, a “module” may be implemented mechanically orelectronically. For example, a module may comprise dedicated circuitryor logic that is permanently configured (e.g., within a special-purposeprocessor) to perform certain operations. A module may also compriseprogrammable logic or circuitry (e.g., as encompassed within ageneral-purpose processor or other programmable processor) that istemporarily configured by software to perform certain operations. Itwill be appreciated that the decision to implement a modulemechanically, in the dedicated and permanently configured circuitry, orin temporarily configured circuitry (e.g., configured by software) maybe driven by cost and time considerations.

Accordingly, the term “module” should be understood to encompass atangible entity, be that an entity that is physically constructed,permanently configured (e.g., hardwired) or temporarily configured(e.g., programmed) to operate in a certain manner and/or to performcertain operations described herein. Considering embodiments in whichmodules or components are temporarily configured (e.g., programmed),each of the modules or components need not be configured or instantiatedat any one instance in time. For example, where the modules orcomponents comprise a general-purpose processor configured usingsoftware, the general-purpose processor may be configured as respectivedifferent modules at different times. Software may accordingly configurethe processor to constitute a particular module at one instance of timeand to constitute a different module at a different instance of time.

Modules can provide information to, and receive information from, othermodules. Accordingly, the described modules may be regarded as beingcommunicatively coupled. Where multiple of such modules existcontemporaneously, communications may be achieved through signaltransmission (e.g., over appropriate circuits and buses) that connectthe modules. In embodiments in which multiple modules are configured orinstantiated at different times, communications between such modules maybe achieved, for example, through the storage and retrieval ofinformation in memory structures to which the multiple modules haveaccess. For example, one module may perform an operation and store theoutput of that operation in a memory device to which it iscommunicatively coupled. A further module may then, at a later time,access the memory device to retrieve and process the stored output.Modules may also initiate communications with input or output devices,and can operate on a resource (e.g., a collection of information).

1. A system to generate an index for a closest match search, the system comprising: at least one processor and executable instructions accessible on a computer-readable medium that, when executed, cause the at least one processor to perform operations comprising: identify a plurality of index signatures based a plurality of member information including first member information that describes a first member, the plurality of index signatures including a first plurality of index signatures associated with a first plurality of index signature scores, the first plurality of index signatures including a first index signature, the first plurality of index signature scores including a first index signature score, the first index signature score representing a percentage of coverage of the first index signature over the first member information, the first plurality of index signatures to signify the first member and not any other members, and store the first plurality of index signatures in the index to enable a closest match of input information to the first member.
 2. The system of claim 1, wherein the operations comprise compare a first plurality of candidate signature scores with a predetermined threshold.
 3. The system of claim 1, wherein the first plurality of index signatures includes a plurality of features, and wherein the plurality of features includes a first feature of the first member.
 4. The system of claim 3, wherein the first feature includes at least one entity, wherein the at least one entity includes a first entity that includes a string of text that is included in the first member information and delimited from other strings of text that are included in the first member information.
 5. The system of claim 1 wherein the operations comprise: receive listing information that includes the input information; and identify a closest match of the input information to at least one of the plurality of index signatures to identify the closest match of the input information to the first member over the other members.
 6. The system of claim 1, wherein the input information includes a string of text with at least one name-value pair.
 7. The system of claim 5, wherein the listing information is for a listing that describes an item for sale on a network-based marketplace, and wherein the input information includes a title of the listing that describes the item for sale on the network-based marketplace.
 8. The system of claim 1, wherein the member information includes a catalogue of products for sale on the network-based marketplace.
 9. The system of claim 8, wherein the first member includes a first product for sale on the network-based marketplace.
 10. A computer-implemented method to generate an index for a closest match search, the method comprising: identifying a plurality of index signatures based on a plurality of member information including first member information that describes a first member, the plurality of index signatures including a first plurality of index signatures associated with a first plurality of index signature scores, the first plurality of index signatures including a first index signature, the first plurality of index signature scores including a first index signature score, the first index signature score representing a percentage of coverage of the first index signature over the first member information, the first plurality of index signatures to signify the first member and not any of the plurality of other members; and storing, by one or more hardware processors, the first plurality of index signatures in the index, the storing to enable a closest match of input information to the first member.
 11. The computer-implemented method of claim 10, wherein the identifying the plurality of index signatures includes comparing a first plurality of candidate signature scores with a predetermined threshold.
 12. The computer-implemented method of claim 10, wherein the first plurality of index signatures includes a plurality of features, and wherein the plurality of features includes a first feature of the first member.
 13. The computer-implemented method of claim 12, wherein the first feature includes at least one entity, wherein the at least one entity includes a first entity that includes a string of text that is included in the first member information and delimited from other strings of text that are included in the first member information.
 14. The computer-implemented method of claim 10 further comprising: receiving listing information that includes the input information; and identifying a closest match of the input information to at least one of the plurality of index signatures to identify the closest match of the input information to the first member over the other members.
 15. The computer-implemented method of claim 10, wherein the input information includes a string of text with at least one name-value pair.
 16. The computer-implemented method of claim 14, wherein the listing information is for a listing that describes an item for sale on a network-based marketplace, and wherein the input information includes a title of the listing that describes the item for sale on the network-based marketplace.
 17. The computer-implemented method of claim 10, wherein the member information includes a catalogue of products for sale on the network-based marketplace.
 18. The computer-implemented method of claim 17, wherein the first member includes a first product for sale on the network-based marketplace.
 19. A machine-readable medium having no transitory signals storing a set of instructions that, when executed by one or more processors of a machine, causes the machine to perform operations comprising: identifying a plurality of index signatures based on a plurality of member information including first member information that describes a first member, the plurality of index signatures including a first plurality of index signatures associated with a first plurality of index signature scores, the first plurality of index signatures including a first index signature, the first plurality of index signature scores including a first index signature score, the first index signature score representing a percentage of coverage of the first index signature over the first member information, the first plurality of index signatures to signify the first member and not any of the plurality of other members; and storing the first plurality of index signatures in the index, the storing to enable a closest match of input information to the first member.
 20. The machine-readable medium of claim 19, wherein the identifying the plurality of index signatures includes comparing a first plurality of candidate signature scores with a predetermined threshold. 